AITopics | reward system

Collaborating Authors

reward system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReDit: Reward Dithering for Improved LLM Policy Optimization

Neural Information Processing SystemsJun-12-2026, 07:34:31 GMT

DeepSeek-R1 has successfully enhanced Large Language Model (LLM) reasoning capabilities through its rule-based reward system. While it's a ''perfect'' reward system that effectively mitigates reward hacking, such reward functions are often discrete. Our experimental observations suggest that discrete rewards can lead to gradient anomaly, unstable optimization, and slow convergence. To address this issue, we propose ReDit (Reward Dithering), a method that dithers the discrete reward signal by adding simple random noise. With this perturbed reward, exploratory gradients are continuously provided throughout the learning process, enabling smoother gradient updates and accelerating convergence.

large language model, machine learning, natural language, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hamsters run on wheels for a surprisingly joyful reason

Even wild animals enjoy a good wheel. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Turns out, that midnight "workout" might not be boredom or restlessness after all. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .

artificial intelligence, garland, physics popular science video space, (11 more...)

Popular Science

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.35)

Industry:

Leisure & Entertainment (0.47)
Education (0.47)
Law > Statutes (0.36)
Information Technology > Security & Privacy (0.36)

Technology: Information Technology > Artificial Intelligence (0.35)

Add feedback

General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks

Shahriar, Fahim, Wang, Cheryl, Azimi, Alireza, Vasan, Gautham, Elanwar, Hany Hamed, Mahmood, A. Rupam, Bellinger, Colin

arXiv.org Artificial IntelligenceOct-9-2025

Abstract-- Goal-conditioned reinforcement learning (GCRL) allows agents to learn diverse objectives using a unified policy. The success of GCRL, however, is contingent on the choice of goal representation. In this work, we propose a mask-based goal representation system that provides object-agnostic visual cues to the agent, enabling efficient learning and superior generalization. In contrast, existing goal representation methods, such as target state images, 3D coordinates, and one-hot vectors, face issues of poor generalization to unseen objects, slow convergence, and the need for special cameras. Masks can be processed to generate dense rewards without requiring error-prone distance calculations. Learning with ground truth masks in simulation, we achieved 99.9% reaching accuracy on training and unseen test objects. Our proposed method can be utilized to perform pick-up tasks with high accuracy, without using any positional information of the target. Moreover, we demonstrate learning from scratch and sim-to-real transfer applications using two different physical robots, utilizing pretrained open vocabulary object detection models for mask generation. I. INTRODUCTION Many real-world robotics tasks, such as sorting objects in e-commerce warehouses or visual navigation to a target site, involve solving a multi-goal problem. These tasks require the agent to act in a particular way among numerous options to achieve various desired outcomes.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2510.06277

Country: North America > Canada (0.46)

Genre: Research Report (0.65)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration

Dai, Gaole, Jiang, Shiqi, Cao, Ting, Yang, Yuqing, Li, Yuanchun, Tan, Rui, Li, Mo, Qiu, Lili

arXiv.org Artificial IntelligenceSep-29-2025

Reward is critical to the evaluation and training of large language models (LLMs). However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is often unavailable, and static trajectory-based LLM-as-a-Judge approaches suffer from limited accuracy. To address these challenges, we propose ProRe, a proactive reward system that leverages a general-purpose reasoner and domain-specific evaluator agents (actors). The reasoner schedules targeted state probing tasks, which the evaluator agents then execute by actively interacting with the environment to collect additional observations. This enables the reasoner to assign more accurate and verifiable rewards to GUI agents. Empirical results on over 3K trajectories demonstrate that ProRe improves reward accuracy and F1 score by up to 5.3% and 19.4%, respectively. Furthermore, integrating ProRe with state-of-the-art policy agents yields a success rate improvement of up to 22.4%.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.21823

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models

Yan, Yuchen, Jiang, Jin, Ren, Zhenbang, Li, Yijun, Cai, Xudong, Liu, Yang, Xu, Xin, Zhang, Mengdi, Shao, Jian, Shen, Yongliang, Xiao, Jun, Zhuang, Yueting

arXiv.org Artificial IntelligenceSep-26-2025

Large reasoning models such as OpenAI o1 and DeepSeek-R1 have achieved remarkable performance in the domain of reasoning. A key component of their training is the incorporation of verifiable rewards within reinforcement learning (RL). However, existing reward benchmarks do not evaluate reference-based reward systems, leaving researchers with limited understanding of the accuracy of verifiers used in RL. In this paper, we introduce two benchmarks, VerifyBench and VerifyBench-Hard, designed to assess the performance of reference-based reward systems. These benchmarks are constructed through meticulous data collection and curation, followed by careful human annotation to ensure high quality. Current models still show considerable room for improvement on both VerifyBench and VerifyBench-Hard, especially smaller-scale models. Furthermore, we conduct a thorough and comprehensive analysis of evaluation results, offering insights for understanding and developing reference-based reward systems. Our proposed benchmarks serve as effective tools for guiding the development of verifier accuracy and the reasoning capabilities of models trained via RL in reasoning tasks.

benchmark, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2505.15801

Country:

Asia (0.68)
North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

'Don't ask what AI can do for us, ask what it is doing to us': are ChatGPT and co harming human intelligence?

The GuardianApr-19-2025, 12:00:39 GMT

Imagine for a moment you are a child in 1941, sitting the common entrance exam for public schools with nothing but a pencil and paper. You read the following: "Write, for no more than a quarter of an hour, about a British author." Today, most of us wouldn't need 15 minutes to ponder such a question. We'd get the answer instantly by turning to AI tools such as Google Gemini, ChatGPT or Siri. Offloading cognitive effort to artificial intelligence has become second nature, but with mounting evidence that human intelligence is declining, some experts fear this impulse is driving the trend.

critical thinking, gerlich, intelligence, (16 more...)

The Guardian

Country:

Europe > United Kingdom (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
(2 more...)

Industry:

Education (0.87)
Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (1.00)

Add feedback

All games with loot boxes will be rated M or higher in Australia

PCWorldSep-20-2024, 15:20:26 GMT

Loot boxes in video games and mobile games have become less of a flashpoint for controversy, but a few years ago they were a major target of ire for both gamers and regulators. The wheels of justice (or at least of legislation) turn slowly, but they do turn, and Australia is making a big move in this sector. Starting this Sunday, any game sold in Australia with loot boxes will be rated either M (Mature) or R 18 (Restricted). For the uninitiated, loot boxes are essentially digital blind boxes. Gamers buy a loot box (or several) in the hopes of finding rare items, weapons, or character outfits. But actually getting what you want is pure chance… and chance that's artificially slimmed down to an incredible longshot for the most rare and desirable items.

australia, game classification, loot box, (7 more...)

PCWorld

Country:

Oceania > Australia (0.98)
North America > United States (0.06)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Evolution of Rewards for Food and Motor Action by Simulating Birth and Death

Kanagawa, Yuji, Doya, Kenji

arXiv.org Artificial IntelligenceJun-21-2024

The reward system is one of the fundamental drivers of animal behaviors and is critical for survival and reproduction. Despite its importance, the problem of how the reward system has evolved is underexplored. In this paper, we try to replicate the evolution of biologically plausible reward functions and investigate how environmental conditions affect evolved rewards' shape. For this purpose, we developed a population-based decentralized evolutionary simulation framework, where agents maintain their energy level to live longer and produce more children. Each agent inherits its reward function from its parent subject to mutation and learns to get rewards via reinforcement learning throughout its lifetime. Our results show that biologically reasonable positive rewards for food acquisition and negative rewards for motor action can evolve from randomly initialized ones. However, we also find that the rewards for motor action diverge into two modes: largely positive and slightly negative. The emergence of positive motor action rewards is surprising because it can make agents too active and inefficient in foraging. In environments with poor and poisonous foods, the evolution of rewards for less important foods tends to be unstable, while rewards for normal foods are still stable. These results demonstrate the usefulness of our simulation environment and energy-dependent birth and death model for further studies of the origin of reward systems.

agent, evolution, experiment, (16 more...)

arXiv.org Artificial Intelligence

2406.15016

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Massachusetts (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.46)
Health & Medicine (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments

Machado, Mateus G., Melo, João G., Zanchettin, Cleber, Braga, Pedro H. M., Cunha, Pedro V., Barros, Edna N. S., Bassani, Hansenclever F.

arXiv.org Artificial IntelligenceApr-23-2024

This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle-free environments compared to baseline algorithms. Additionally, our findings demonstrated dynamic obstacle avoidance capabilities, adeptly navigating around moving blocks. These findings highlight the potential of RL to enhance robot motion planning in the challenging and unpredictable SSL environment.

agent, obstacle, robot, (13 more...)

arXiv.org Artificial Intelligence

2404.1541

Country:

Europe > Portugal > Braga > Braga (0.04)
South America > Brazil > Pernambuco > Recife (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment > Sports > Soccer (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Game-Theoretical Analysis of Reviewer Rewards in Peer-Review Journal Systems: Analysis and Experimental Evaluation using Deep Reinforcement Learning

Lee, Minhyeok

arXiv.org Artificial IntelligenceMay-20-2023

In this paper, we navigate the intricate domain of reviewer rewards in open-access academic publishing, leveraging the precision of mathematics and the strategic acumen of game theory. We conceptualize the prevailing voucher-based reviewer reward system as a two-player game, subsequently identifying potential shortcomings that may incline reviewers towards binary decisions. To address this issue, we propose and mathematically formalize an alternative reward system with the objective of mitigating this bias and promoting more comprehensive reviews. We engage in a detailed investigation of the properties and outcomes of both systems, employing rigorous game-theoretical analysis and deep reinforcement learning simulations. Our results underscore a noteworthy divergence between the two systems, with our proposed system demonstrating a more balanced decision distribution and enhanced stability. This research not only augments the mathematical understanding of reviewer reward systems, but it also provides valuable insights for the formulation of policies within journal review system. Our contribution to the mathematical community lies in providing a game-theoretical perspective to a real-world problem and in the application of deep reinforcement learning to simulate and understand this complex system.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.12088

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback